Spiral me to the core: Getting a visual grasp on text corpora through clusters and keywords
نویسندگان
چکیده
The amount of literature within a research domain is ever growing, thus making it difficult to stay on top of everything. Getting a grasp on the important topics of and areas within a domain or even knowing where to start is often tough and tedious. This paper therefore presents a visualization, that is a cluster spiral, that offers a fast but plain and simple way of exploring the content of large text collections.
منابع مشابه
Notes in Artificial Intelligence 7499 Subseries of Lecture Notes in Computer Science
Corpora are not easy to get a handle on. The usual way of getting to grips with text is to read it, but corpora are mostly too big to read (and not designed to be read). We show, with examples, how keyword lists (of one corpus vs. another) are a direct, practical and fascinating way to explore the characteristics of corpora, and of text types. Our method is to classify the top one hundred keywo...
متن کاملVerseVis: Visualization of Spoken Features in Poetry
The exploration and analysis of literary corpora is a difficult task. Previous approaches to this problem focused on mining data directly from text. However, these solutions do not aid researchers who are interested in learning spoken features of the text, which play an important role in poetic works. VerseVis is a text visualization tool that gives users the ability to identify interesting tex...
متن کاملPublished vs. Postgraduate Writing in Applied Linguistics: The Case of Lexical Bundles
Abstract: Lexical bundles, as building blocks of coherent discourse, have been the subject of much research in the last two decades. While many of such studies have been mainly concerned with exploring variations in the use of these word sequences across different registers and disciplines, very few have addressed the use of some particular groups of lexical bundles within some gen...
متن کاملUsing it Bundles in Published and Unpublished Writings
Lexical bundles are known as important elements of coherent discourse that have been the subject of much research. While the previous research has been mainly concerned with exploring variations in the use of these word sequences across different registers and disciplines, very few studies have addressed the use of some particular groups of lexical bundles within some types of academic writing....
متن کاملGetting to Know Your Corpus
Corpora are not easy to get a handle on. The usual way of getting to grips with text is to read it, but corpora are mostly too big to read (and not designed to be read). We show, with examples, how keyword lists (of one corpus vs. another) are a direct, practical and fascinating way to explore the characteristics of corpora, and of text types. Our method is to classify the top one hundred keywo...
متن کامل